Towards a MapReduce Application Performance Model
نویسندگان
چکیده
In the modern age, our ability to generate large data sets far outpaces our capacity for analyzing them. Google’s proposed solution to this fundamental problem – the MapReduce paradigm and runtime system – has recently gained traction in the scientific and “big data” industries. However, the performance characteristics of MapReduce are not well known. This paper builds on the e↵orts of prior research to more accurately characterize and model the performance of MapReduce applications on large-scale distributed systems.
منابع مشابه
A Model-driven Approach for Price/Performance Tradeoffs in Cloud-based MapReduce Application Deployment
This paper describes preliminary work in developing a modeldriven approach to conducting price/performance tradeo s for Cloudbased MapReduce application deployment. The need for this work stems from the signi cant variability in both the MapReduce application characteristics and price/performance characteristics of the underlying cloud platform. Our approach involves a model-based machine learn...
متن کاملTowards Energy Efficient MapReduce
Energy considerations are important for Internet datacenters operators, and MapReduce is a common Internet datacenter application. In this work, we use the energy efficiency of MapReduce as a new perspective for increasing Internet datacenter productivity. We offer a framework to analyze software energy efficiency in general, and MapReduce energy efficiency in particular. We characterize the pe...
متن کاملTowards an Ontology-Based Semantic Approach to Tuning Parameters to Improve Hadoop Application Performance
Hadoop MapReduce assists companies and researchers to deal with processing large volumes of data. Hadoop has a lot of configuration parameters that must be tuned in order to obtain a better application performance. However, the best tuning of the parameters is not easily obtained by inexperienced users. Therefore, it is necessary to create environments that promote and motivate information shar...
متن کاملTowards Control of MapReduce Performance and Availability
MapReduce is a popular programming model for distributed data processing and Big Data applications. Extensive research has been conducted either to improve the dependability or to increase performance of MapReduce, ranging from adaptive and on-demand fault-tolerance solutions, adaptive task scheduling techniques to optimized job execution mechanisms. This paper investigates a novel solution tha...
متن کاملUsing Realistic Simulation to Identify I/O Bottlenecks in MapReduce Setups
The exponentially growing data demands of modern enterprise and scientific applications poses critical challenges in sustaining the applications at scale. The MapReduce [1] programming model has served as the key enabler for executing resource-intensive applications over huge datasets. However, its configuration design-space has not been studied in detail. This is a complex problem as a typical...
متن کامل